Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms
نویسنده
چکیده
Exploring the dataset features through the application of clustering algorithms is a viable means by which the conceptual description of such data can be revealed for better understanding, grouping and decision making. Some clustering algorithms, especially those that are partitionedbased, clusters any data presented to them even if similar features do not present. This study explores the performance accuracies of partitioning-based algorithms and probabilistic model-based algorithm. Experiments were conducted using kmeans, k-medoids and EM-algorithm. The study implements each algorithm using RapidMiner Software and the results generated was validated for correctness in accordance to the concept of external criteria method. The clusters formed revealed the capability and drawbacks of each algorithm on the data points.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملAssessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories
In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...
متن کاملModeling of a Probabilistic Re-Entrant Line Bounded by Limited Operation Utilization Time
This paper presents an analytical model based on mean value analysis (MVA) technique for a probabilistic re-entrant line. The objective is to develop a solution method to determine the total cycle time of a Reflow Screening (RS) operation in a semiconductor assembly plant. The uniqueness of this operation is that it has to be borrowed from another department in order to perform the production s...
متن کاملSoF: Soft-Cluster Matrix Factorization for Probabilistic Clustering
We propose SoF (Soft-cluster matrix Factorization), a probabilistic clustering algorithm which softly assigns each data point into clusters. Unlike model-based clustering algorithms, SoF does not make assumptions about the data density distribution. Instead, we take an axiomatic approach to define 4 properties that the probability of co-clustered pairs of points should satisfy. Based on the pro...
متن کاملAn Efficient Method of Partitioning High Volumes of Multidimensional Data for Parallel Clustering Algorithms
An optimal data partitioning in parallel/distributed implementation of clustering algorithms is a necessary computation as it ensures independent task completion, fair distribution, less number of affected points and better & faster merging. Though partitioning using Kd-Tree is being conventionally used in academia, it suffers from performance drenches and bias (non equal distribution) as dimen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014